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In Bayesian statistics, one's prior beliefs about underlying model parameters are revised with the 
information content of observed data from which, using Bayes' rule, a posterior belief is obtained. A 
non-trivial example taken from the isospin analysis of B — > PP (P — n or p) decays in heavy-flavor 
physics is chosen to illustrate the effect of the naive "objective" choice of flat priors in a multi- 
dimensional parameter space in presence of mirror solutions. It is demonstrated that the posterior 
distribution for the parameter of interest, the phase a, strongly depends on the choice of the param- 
eterization in which the priors are uniform, and on the validity range in which the (un-normalizable) 
priors are truncated. We prove that the most probable values found by the Bayesian treatment do 
not coincide with the explicit analytical solutions, in contrast to the frequentist approach. It is 
also shown in the appendix that the a — » limit cannot be consistently treated in the Bayesian 
paradigm, because the latter violates the physical symmetries of the problem. 



I. INTRODUCTION 

In Bayesian statistics, probability is a measure of one 
person's state of knowledge (also called degree of be- 
lief) of the plausibility of a proposition given incomplete 
knowledge at a given time. Another person may have 
a different degree of belief in the same proposition, and 
so have a different probability. The only constraint is 
that the probabilities chosen by a single person should 
be consistent ( "coherent" ) : they should obey all the ax- 
ioms of probability [1]. Bayes' rule is understood as a 
revision process, by which a prior probability is changed 
into a new one, the posterior probability, due to input 
information provided by the data. 

The result of any inference problem is the posterior 
distribution of the quantity of interest. Bayesian mod- 
els require the specification of prior distributions for all 
unknown parameters, expressing the actual personal de- 
grees of belief based on all the available information prior 
to updating one's degree of belief with the information 
content of the data. In the case where prior knowledge 
about model parameters is unavailable, the specification 
of prior distributions is never unequivocal. 

Neither Bayesian statistics nor any other framework 
provides fundamental rules for obtaining the prior prob- 
ability about the parameters. 1 The specification of prior 



It is not surprising that no rules are given because knowledge is 
a very poorly defined concept. In the personalistic Bayesian ap- 
proach developed by F.P. Ramsey, B. de Finetti and L.J. Savage, 
personal degrees of belief are represented numerically by betting 
quotients: one should assign and manipulate probabilities so that 
one cannot be made a sure loser in betting based on them [2], 



distribution may be possible in some simple cases but is 
impractical in complicated problems if there are many 
parameters. In practice, especially when nothing or very 
little is known about the parameters, most Bayesian anal- 
yses are performed with so-called "non-informative pri- 
ors" [4]. An obvious candidate for a non-informative 
prior is to use a flat prior. The notion of a flat prior 
is not well-defined because a flat prior of one parame- 
ter does not imply a flat prior on a transformed version 
of that parameter. Prior density distributions are not 
transformation-invariant, because they depend on the 
metric. For example, a uniform distribution of 9 does 
not lead to a uniform distribution of fi = sin(0). Thus, 
there is a fair amount of arbitrariness in how ignorance 
is parameterized, which will affect the posterior proba- 
bility and hence the result. Moreover, there is a funda- 
mental difference rarely acknowledged between knowing 
that a uniform prior probability distribution in the range 
[6 a , 0b\ has been assigned to the value of a parameter 9 
as a result of positive knowledge, and not knowing any- 
thing about 9 with the exception of its admissible range. 
These are two fundamentally different states of knowl- 
edge. It is often claimed, though without any proof, that 
the relative prior dependence of the posterior probability 
distribution is reduced as the statistical information from 
the measured data is increased. 

In the physical sciences, the invariance of conclusions 
drawn from data under a particular parameter choice is a 
fundamental concept. Furthermore, it is questionable [5] 
"whether scientists have prior degrees of beliefs in the 



Betting cannot be used to measure the strength of someone's 
belief in a universal scientific law or theory [3] . 



2 



hypotheses they investigate and whether, even if they do, 
it is desirable to have them figure centrally in learning 
from data to science. In science, it seems, we want to 
know what the data are saying, quite apart from the 
opinions we start out with." 

The mathematical content of this paper being rather 
simple, its purpose is to illustrate with a concrete use 
case how strongly prior-dependent the Bayesian treat- 
ment can be. The chosen example is taken from the field 
of particle physics, more specifically from recent results 
discussed in the domain of CP violation. The extrac- 
tion of the CKM phase a from the isospin analysis of 
B — ► PP (P = 7r or p) decays is used as an illustra- 
tion of a Bayesian analysis at work with flat priors in a 
multi-dimensional parameter space in presence of mirror 
solutions. Troublesome results are obtained [6]. 

We begin by introducing the analysis formalism and 
the statistical approaches used to interpret the experi- 
mental results. We present the results of the so-called 
B — > 7T7T isospin analysis (see below) in several param- 
eterizations finding that the Bayesian method leads to 
very different conclusions depending on the choice of 
the parameterization. We then explicit a simpler two- 
dimensional example that bears some similarities with 
the extraction of a, namely the existence of mirror solu- 
tions, and show why the Bayesian treatment amounts to 
an unacceptable interpretation of fundamental physics 
parameters. Finally in appendix we explain in detail 
the role of the a — > limit, its associated mathemati- 
cal properties and its relation to CP violation and new 
physics. We show that the Bayesian treatment leads to 
an unrecoverable divergence when one parameterizes the 
Standard Model amplitudes by their real and imaginary 
parts, i.e., when one uses the parameterization that is the 
most natural from the point of view of the computation of 
Feynman diagrams. Readers well-aware of the basics of 
the frequentist and Bayesian statistical treatments may 
skip Sections III, IV and V. 



II. ANALYSIS FORMALISM 

The experimental framework consists of the measure- 
ment of six observables: three branching fractions and 
three asymmetries (see, e.g., Ref. [7]). The three branch- 
ing fractions are related to the three B meson decays: 
B° -> 7T+7T-, B° -> 7r°7r°, B ± -» tt^tt , where the av- 
erage is implicitly taken between the two CP-conjugate 
decays; B° and B°, and B + and B~ , respectively. The 
three asymmetries are quantities which would vanish in 
the absence of CP violation (charge conjugation times 
spatial parity): theses quantities modulate the time de- 
pendence of the neutral P-meson decays. They are de- 
noted S+~, C+- and C 00 . 

The use of the isospin analysis to extract fundamen- 
tal parameters from the B — > tttt observables is a well 
known problem that was first solved in 1990 [8]. Assum- 
ing isospin symmetry (an approximate SU(2) flavor sym- 



metry of the strong interaction, which is known to hold 
to better than a few percent accuracy), the two pion fi- 
nal state can be represented as a superposition of isospin 
zero (I = 0) and isospin two (7 = 2) eigenstates. Within 
the Standard Model, the P°-decay amplitudes can then 
be written as 

A+- = A(B° -» 7T+7T-) = e- ia T+- + P , 
V2A 0Q = V2A(B° -» ttV) = e - ia T 00 - P , (1) 
V2A+° = V2A(B+ -» tt+tt ) = e - ia (T Q0 +T+-) , 

where a peculiar phase convention has been taken (in 
which q/p, the B°B° mixing phase shift, is equal to one). 
The triangular relation V2A +0 = V2A 00 + A+~ that fol- 
lows from the parameterization above is a consequence 
of the isospin symmetry, and the only information that 
we have on the amplitudes without any additional hy- 
pothesis; the "P" (the so-called "penguin") term in the 
neutral mode comes from AI = 1/2 operators that can- 
not contribute to the A J = 3/2 charged transition, hence 
the absence of a "P" term in A +a . This notation makes 
explicit the presence of CP violation through the CP-odd 
weak-interaction phase a (which is the parameter of main 
interest), while the other (hadronic) parameters are CP- 
conserving complex numbers. The CP-conjugated am- 
plitudes are thus obtained from Eq. (1) by a sign flip 
a — > —a. In the following, the parameterization from 
Eq. (1) will be referred to as the "Standard Model" pa- 
rameterization. 

The observables that are currently measured by the 
P-factory experiments BABAR and Belle are the CP- 
averaged branching fractions oc (tb /2)(\A V \ 2 + 
|A y | 2 )/2, where tb is the B°- or P + -mcson lifetime 
depending on the modes, the direct CP asymmetries 
C ij = (| A y|2 _ \A^\ 2 )/(\A^\ 2 + \A^\ 2 ) andjhe B°B°- 
mixing- induced CP asymmetry = Im(A H jA^ ). 
This adds up to six independent constraints on the six 
independent parameters, namely a and either the modu- 
lus and argument, or equivalently the real and imaginary 
parts, of T H , T 00 and P (one overall phase being ir- 
relevant, it can be fixed to any value without altering 
the observables). Thus the system of equations is just 
constrained. It can be inverted explicitly [9], what will 
be referred to in the following as the "explicit solution" 
parameterization. 2 It is assumed throughout this paper 



2 One can extract the angle a, up to discrete ambiguities, provided 
elcctroweak penguin contributions are negligible (P EW = 0). 
The explicit solutions in terms of a are given by [10, 11] 

sin(2a e ff )c + cos(2a c ff )s + s 
tana = — — , 

cos(2a c ff )c — sin(2a e ff )s + c 

where all quantities on the right hand side can be expressed in 
term of the observables as follows: 

s +- . 

sin(2a e ff) = — — —. ,cos(2a off ) = ±V 1 - sin 2 (2a off ) , 

V 1 — C^TI 
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that all observables are obtained from Gaussian measure- 
ments, and that there is no source of uncertainty other 
than the unknown tree and penguin amplitudes defined 
above. The fit of a or of any subset of the unknown pa- 
rameters is therefore a classical, well-defined, statistical 
problem. 

What makes the isospin analysis an interesting exam- 
ple, besides its non-linearity due to trigonometric func- 
tions, is the presence of a high-order exact degeneracy 
between mirror solutions. Indeed it can be shown explic- 
itly (see, e.g., Ref. [9]) that there are eight parameter 
sets that give exactly the same value for the observables 
when one restricts a to the range [0, it] (eight other sets 
are found trivially through the transformation a — > a+ir, 
T y — > — T % i that leaves the system invariant). Even with 
infinite statistics, it is not possible from a given set of 
measurements to determine which of the various mirror 
solutions is the true one. 

The same formalism can be applied (to a good approx- 
imation) to the quasi two-body decay B — > pp. How- 
ever in this case, only an upper bound on the branching 
fraction to p°p a is currently available, which makes the 
isospin analysis an under-constrained system. It has been 
shown in the literature [9, 12] that one obtains bounds 
on the phase a in such a case. 

We consider another useful parameterization of the de- 
cay amplitudes, which has been proposed in Ref. [10]. It 
will be referred to as the "Pivk-LeDiberder" parameter- 
ization. One introduces six parameters a, a e g, p, a, a 
and A through the definitions 
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The above description is the simplest one that makes 
explicit the two crucial ingredients of the isospin analysis, 
namely the triangular relation between the amplitudes 
and the fact that 2a is the phase difference between A +0 
and A+°. 



III. FREQUENTIST ANALYSIS 

"Frequentist statistics provides the usual tools for re- 
porting objectively the outcome of an experiment with- 
out needing to incorporate prior beliefs concerning the 
parameter being measured or the theory being tested. As 
such they are used for reporting essentially all measure- 
ments and their statistical uncertainties in High Energy 
Physics" [13]. The Frequentist sees probability as the 
long-run relative frequency of occurrence. Hence, the 
frequentist analysis assumes that a population mean is 
real, but unknown, and unknowable, and can only be 
estimated from the available data. 

We neglect in the following the occurrence of physical 
boundaries for the true values which greatly simplifies the 
computation of frequentist confidence levels. We adopt 
a x 2 -like notation and define 



X 2 (0)^-21n(£ {x} (0)) : 



(3) 



where the likelihood function, C, quantifies the agree- 
ment between the measured observables, {x}, and their 
theoretical counterparts, {{(6)}. The 6 parameters are 
the unknowns of the theory, e.g., for the Pivk-LeDiberder 
parameterization one has 6 = {a, a e g, p, a, a, A}. 

Under these assumptions, and neglecting experimental 
correlations, the likelihood components of C are indepen- 
dent Gaussian distribution functions 



£{x t }{d) oc — exp 



(4) 



each with a standard deviation given by the statistical 
uncertainty cr, on the measurement i. 3 In this case, using 
incomplete T functions (as computed, e.g., with "Prob" 
the well known routine from the CERN library), one can 
infer a confidence level (CL) from the above x 2 value as 
follows 



CL = Prob( X 2 W,A r dof) , 



V / 2 1 ^r(jV dof /2) 



OO 

L-t/2 t N do[ /2-l dt _ (5) 



X 2 W 



IV. BAYESIAN ANALYSIS 



— ^B+S + B+T(l + C+D/2 - B°o (1 + c oo ) 

' B+ B+ 

t b° v /2B+ w -B+°(l + C'+ r -) 

— + B+"(l - C+-)/2 - BOO d _ c oo ) 

' B+ B+ 



t b« V / 2B+-B+°(1-C+-) 

s = ±y/l - c 2 ,s = ±y/ 1 - c 2 . 

The eightfold ambiguity for a in the range [0, n] is made explicit 
by the three arbitrary signs. 



Bayesian probability, also named personal probability 
or (more often but less appropriately) subjective proba- 
bility, represents one's degree of belief. 4 It is thus a sum- 



3 In practice, one has to deal with correlated measurements and 
with additional experimental and theoretical systematic uncer- 
tainties, which are however irrelevant for the discussion of this 
paper. 

4 A belief could be well- or ill-founded, in agreement or disagree- 
ment with the facts, etc., it is clear that an expression or assertion 
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mary of one's own opinions about an uncertain proposi- 
tion, not something inherent to the system being stud- 
ied. 5 Thus, in the personalistic Bayesian approach, hy- 
potheses or unknowns can never be directly measured or 
statistically evaluated. A personal probability statement 
cannot be proved or disproved, verified or falsified. 6 

A posterior probability density function (PDF), 
P(9\{x}), for a model parameter 9 having observed the 
data {x} is, using the Bayes' rule: 



P(0|{x}) 



fP({x}\9)n(9)d9 ' 



(6) 



where P({x}\9) is nothing more than the likelihood func- 
tion 7 (£{ x }(#)) of the true parameter value 9 taken at the 
observed data {x}. The posterior PDF in 9 is obtained 



of that belief has no necessary connections to the facts. It is not 
"objective" in the root sense of being about or dependent upon 
objects in the real world, but is rather subjective in the sense of 
being about or dependent upon the psychological subject [14]. 

5 Bayesian personal probability reflects the scientist's confidence 
that a hypothesis is true (among all other rival hypotheses). 
A scientist's personal probability for a hypothesis is, then, 
more a psychological fact about the scientist than an observer- 
independent fact about a hypothesis. It is not a matter of how 
likely the truth of a hypothesis actually is, but about how likely 
the scientist thinks it to be [5, 15]. In other words, in the per- 
sonalistic Bayesian viewpoint, "the probability of A is 1/2" is 
not a statement about A, it is a statement about the state of 
mind (opinion) of the person making the assertion. In a system 
which defines probability as the individual's degree of belief in 
a proposition, it is obvious that there can be no one answer to 
"what is the probability of X?" There are as many answers as 
there are beliefs, and no answer is better than any other (coher- 
ent) answer, since the individual is theoretically free to hold any 
opinion whatsoever [14]. 

6 One characteristic of the scientific method is the formulation of 
testable hypothesis. The objectivity of scientific statements rely 
upon the fact that they can be submitted to tests in a reliable 
manner and with checkable assumptions [16]. So, in order to 
exceed the level of a mere speculation, any theory of inference 
about parameters must be exposed, i.e., must be able to make 
predictions that can be verified by experiments (or falsified, in 
K. Popper's version of the same idea). Hence, adopting a set 
of axioms docs not guarantee a success in modeling the empir- 
ical world — one needs an extra argument, such as empirical 
verification, to justify the use of any given set of axioms. The 
personalistic Bayesian viewpoint claims that probability state- 
ments cannot be verified (because probability does not exist in 
an objective sense, in de Finetti's motto: "Probability does not 
exist"). The important point here is that probabilities obtained 
in this way do not correspond directly to anything objectively 
existing in the real world. To be tested a probability proposition 
needs to be converted to a statistical proposition, which is verifi- 
able. Probability statements are not descriptive but judgmental. 

7 P(x|#) is a probability (discrete data) or a PDF (continuous 
data) as a function of the data and all possible data are con- 
sidered, including the data not observed. If the data are con- 
sidered fixed (at the measured value), then P(x\8) is no longer 
a probability or a PDF, it becomes the likelihood function, 
£{x}($) = ^ 3 ({ x }|^)> a function of the true model parameter 
8 [17]. The likelihood function is not the PDF of 8, given {x}. 
To turn the likelihood function into the PDF P(0|{x}), one needs 
to invoke the Bayes' rule where a prior PDF is mandatory. 



by multiplying C by the prior PDF 7r(0), the probability 
density function for the model parameter 9. In essence, 
the prior is reweighed according to the likelihood of the 
data. 8 

It should be emphasized here that the probability as- 
sociated with the value of a model parameter cannot be 
interpreted meaningfully as a frequency of an outcome 
of a repeatable experiment. Instead, it is understood to 
reflect the degree of belief that the parameters have par- 
ticular values. Stated otherwise, since the parameter 9 is 
not a random variable, 9 the probability distribution for 
9 is not a probability distribution in the usual frequency 
sense, i.e., one cannot sample from this distribution and 
obtain various values for 9 [19]. However, the prior PDF 
it (9) can be defined with the formal rules of probabilities 
and quantifies one's degree of belief about the param- 
eter before carrying out the experiment, i.e., no matter 
what the data are. There is no fundamental recipe for as- 
signing a priori probabilities to parameters. Bayes' rule, 
after choosing a certain prior tt(9), only states how the a 
posteriori probability changes in the light of the existing 
experimental data. In other words, Bayesian posterior 
PDF depends not only on the observation itself, but also 
on the state of knowledge and beliefs of the observer. 
As a consequence, the posterior PDF by itself does not 
in general provide a useful summary of the result of the 
experiment, as it convolves the data with the personal 
beliefs needed to construct the prior PDF. 

In the case of more than one parameter, 6\, . . . , 9 m , the 
a posteriori PDF of, say the parameter 6\, is obtained 
by integrating out the parameters Q 2 ,...,9 m to get the 
marginal PDF 

P(0i|{x}) = Jp(9 1 ,9 2 ,...,9 m \{x})d9 2 ...d9 m . (7) 

By doing so, one has chosen a certain parameterization 
(9i, . . . ,9 m ) and a corresponding metric (d9\ . . . d9 m ). 



8 In Bayesian probability, one does not try to test or refute one's 
prior probabilities, one simply changes them into posterior prob- 
abilities by Bayesian conditionalization. If the initial assump- 
tions are seriously wrong in some respects, then not only will the 
prior probability function be inappropriate, but all the condi- 
tional probabilities generated from it in the light of new evidence 
will also be inappropriate. To obtain reasonable probabilities in 
such circumstances, it will be necessary to change one's prior 
probability in a much more drastic fashion than Bayesian prob- 
ability allows, and, in effect, introduce a new prior probability 
function [2] . The important point is that even when one's degree 
of belief changes with new evidence, in no way does it show one's 
previous degree of belief to have been mistaken. Furthermore, no 
proof is required for the posterior distribution to have desirable 
properties. The personalistic Bayesian philosophy not only fails 
to make such a recommendation but asserts that this cannot be 
done at all. 

9 Following the Bayesian paradigm, a probability density distribu- 
tion to 8 is assigned to express one's uncertainty, not to attribute 
randomness to 8 [18]. 
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V. PRIORS 

Non-informative prior distributions are generally im- 
proper (they do not normalize) when the parameter space 
is not compact which may lead to an improper posterior. 
However, a posterior must always be proper, in other 
words, it must be a probability (discrete parameter) or 
a PDF (continuous parameter). To remedy this prob- 
lem what is done in practice is to truncate 10 the range 
of the prior. However, the ranges for the prior PDF with 
7r(0) 7^ restrict the allowed range for the posterior PDFs 
in P(#|{x}). Hence, it has to be verified that the a priori 
ranges do not introduce a cut in the posterior PDF. If 
this, however, is the case one needs to either enlarge the 
ranges of the prior PDF where ir(8) ^ 0, or to justify the 
ranges used. 

In an ideal case, the posterior PDF should not de- 
pend on the prior PDF but only on the experimental 
likelihood. However, in reality, the choice of prior al- 
ways matters. Belief is not easily measured with high 
accuracy. The extent of approximation hidden in the 
prior densities is seldom considered in Bayesian analyses. 
What is sometimes suggested is to perform a robustness 
analysis [1] (sensitivity analysis of the posterior by con- 
sidering, individually, the effects of a small number of 
potential alternative choices of a model component, such 
as parameterization or prior distribution) . However, this 
concept is not well-defined. It lacks a criterion of what 
is an acceptable change of the posterior and also which 
class of priors should be used. 

It is often stated that the data swamp the prior: the 
prior is washed out as the number of observation in- 
creases. The statement of the "washing out" of the prior 
lacks a qualitative and also quantitative proof and would 
need to be verified case-by-case. Furthermore, the pos- 
sibility of eventual convergence of belief is irrelevant to 
the day-to-day problem of learning from data in science. 
Moreover, it provides only an illusion of the existence 
of an objective probability [2] (the eventual convergence 
of opinions by remaining coherent (internal consistency 
with the probability axioms) is not enough to guarantee 
that the Bayesian answer is a good answer to a real- world 
question). 

We cannot verify if the Bayesian probability P(6>|{x}) 
is "correct" by observing the frequency with which 9 oc- 
curs, since this is not the way Bayesian probability is 
defined. Hence, it would be odd trying to justify post 
hoc the priors on frequentist grounds [20]. 

Let us illustrate with a very simple educated scenario 
how (precise) posterior results can be solely "determined" 
by the multidimensional convolution of prior probability 



Use of "vague proper prior" in such situations will formally result 
in proper posterior distributions, but these posteriors will essen- 
tially be meaningless if the limiting improper prior had resulted 
in an improper posterior distribution. 
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FIG. 1: Bayesian posterior PDF of the radius R = y"^_ a 
in a six-dimensional parameter space where no experimental 
input is known for the values of the Xi, and where the "rea- 
sonable" choice of flat priors for the Xi, with < x% < 1, has 
been used. The Bayesian treatment succeeds to constrain the 
radius. 

densities. For instance, we may consider a iV-dimensional 
parameter space Xi, with i = 1,2, ...N, with < Xi < 1, 
in which no experimental input is known for the values 
of the Xi'. one knows nothing at all. Still, even in such 
a total absence of knowledge, a Bayesian treatment can 
pretend powerful constraints. If one is interested in the 

radius R = \Jj2iLi x i where lies the true set of model 
parameters, the Bayesian answer is clear. It is shown in 
Fig. 1, for N = 6, and for the "reasonable" choice of 
flat priors. One finds R = 1.39 ± 0.27, where the central 
value corresponds to the mean radius and the errors give 
the symmetric 68% posterior probability interval around 
the mean value. One has achieved the remarkable feat of 
learning something about the radius of the hypersphere, 
whereas one knew nothing about the Cartesian coordi- 
nates and without making any experiment. 

VI. PARAMETERIZATIONS 

In this section we consider the Bayesian treatment 
applied to B — > irir decays (the Bayesian treatment 
of B — > pp decays is discussed in Appendix A). The 
current world average values for the observables are 
B+- = (5.1 ± 0.4) x 10~ 6 , B +0 = (5.5 ± 0.6) x 10~ 6 , 
B 00 = (1.45 ± 0.29) x 10~ 6 , C+- = -0.37 ± 0.10, 
S+- = -0.50 ±0.12, C 00 = -0.28 ±0.40 [21]. These 
observables are reproduced by the model with the follow- 
ing eight values for the phase a (in degrees) , as computed 
from the analytical solutions in Footnote 2: 

{6.3, 83.7, 91.8, 120.7, 128.8, 141.2, 149.3, 178.2} . (8) 

We apply the Bayesian treatment using the four param- 
eterizations in Section II: 



6 



TABLE I: Ranges taken for the parameters of the various parameterizations used in this paper to fit the B — > tttv observables. 
The ranges for the MA, PLD and ES parameterizations are chosen such that they fully contain the posteriors for all parameters 
(see Fig. 3). The ranges for the RI parameterization are chosen similar to the ones in the MA parameterization. The choice of 
the ranges in the RI parameterization will be further discussed in Appendix B 3. The phases are given in radians. 
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• the Standard Model modulus and argument param- 
eterization (MA); 

• the Standard Model real and imaginary parameter- 
ization (RI); 

• the Pivk-LeDiberder parameterization (PLD); 

• the explicit solution parameterization (ES). 

Obviously, one may consider a much larger variety of pa- 
rameterizations, 11 but the ones considered here are suf- 
ficient to make our point clear. They are all natural in 
the sense that they were defined beforehand without hav- 
ing in mind the present discussion. For all four param- 
eterizations we use uniform priors for all the six model 
parameters. In particular, the prior PDF used for a is 
uniform in the range [0, 7r] for the MA, RI and PLD pa- 
rameterizations (for the ES parameterization, a is not 
an input model parameter). The choice of uniformity is 
not the result of a strong argument, nor is it particularly 
natural; rather it is taken for the sake of simplicity and 
to be conform to the choice made by Bayesian analyses 
already published on the subject [6]. The ranges used for 
the parameters are given in Table I. The resulting poste- 
rior PDFs for a are shown in Fig. 2 (the posterior PDFs 
for the other model parameters are shown in Fig. 3). The 
top plot gives the 1 — CL result for the frequentist treat- 
ment. It is independent of the parameterization used. 
The eight solutions in Eq. (8) for the phase a are clearly 
visible, and correspond exactly to the analytical solutions 
in Footnote 2. 



A fifth parameterization is introduced in Appendix B 3 to discuss 
further Bayesian peculiarities with the RI parameterization. 



A. Modulus and Argument Parameterization 

The Bayesian treatment indicates the presence of basi- 
cally two mirror solutions; the previous mirror solution at 
a ~ 135° and a new solution, which is strongly favored, 
at a ~ 165°. One also observes that values nearby 0° and 
180° are excluded in this parameterization, in contrast 
with the following parameterizations. In the Bayesian 
approach, all the information resides in an individual's 
posterior probability, the posterior PDF for a must be 
read as the individual's updated belief in the plausible 
values of the parameter a. The individual's degree of 
belief is thus higher for a value of a ~ 165° than for 
a~ 135°. 



B. Real and Imaginary Parameterization 

The Bayesian treatment seems to detect the presence 
of two mirror solutions, a mirror solution at a ~ 10° and 
another solution, which is favored, at a ~ 175°. The 
posterior PDF appears to vanish at the origin (and at 
180°). However, this is only an artifact resulting from 
the truncation of the prior ranges used for |T H |, Rc(P) 
and Re(T 00 ) (see Footnote 10). As demonstrated in Ap- 
pendix B3 the posterior diverges for a = 0° (180°) and 
is not normalizable. Expanding the ranges leads to an 
improper posterior for a. 



C. Pivk-LeDiberder Parameterization 

The Bayesian treatment only vaguely detects the pres- 
ence of mirror solutions. The posterior PDF tends to fa- 
vor a value for a (135°) which corresponds to a dip in the 
frequentist 1 — CL. This is a hint that the Bayesian treat- 
ment introduces a piece of information which is "missed" 
by the frequentist analysis. Since the latter uses all the 
available experimental data, this additional piece of in- 
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FIG. 2: Results for the CKM phase a obtained with the different parameterizations from the B —* tttv isospin analysis. The 
upper plot shows the frequentist confidence level, which is independent of the parameterization used. The solutions (1) through 
(8) coincide with the analytical solutions computed from the expressions given in Footnote 2. The remaining plots show the 
Bayesian a posteriori PDFs for the parameterizations indicated by the labels. The posterior PDF in the RI parameterization 
depends strongly on the ranges chosen for the priors of \T+~\, Re(P) and Re(T 00 ) parameters (see Appendix B 3). 



formation must be embedded in the priors. One also 
observes that values nearby 0° and 180° are not disfa- 
vored. 



D. Explicit Solution Parameterization 

The Bayesian treatment detects the presence of mirror 
solutions, but like in the PLD parameterization, one solu- 



tion is favored for a (135°). The fact that one solution is 
favored over the others is because two nearby mirror so- 
lutions are superimposed in the Bayesian treatment (see 
Section VIII). This feature is present in all parameteri- 
zations. In the RI parametrization, it is reflected only as 
a shoulder in the posterior PDF. One also observes that 
values nearby 0° and 180° are not disfavored. The poste- 
rior PDF for a is akin to the one obtained with the PLD 
parameterization. This is because the latter parameteri- 
zation is chosen close to the measured quantities. 
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MA parameterization — a posteriori PDFs 



RI parameterization — a posteriori PDFs 
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FIG. 3: Posterior PDFs for the various model parameters used in this paper except for a which are shown in Fig. 2. One 
observes that for the RI parameterization the posterior PDFs for Re(P) and Re(T 00 ) do not vanish at the lower edge of their 
ranges so the ranges should be expanded further. Although the posterior appears to be contained in its prior range, 

it turns out that it must also be expanded further. More generally, as illustrated by the RI parameterization, for the sake of 
consistency, a Bayesian treatment must provide the proof that the range chosen for the priors are not introducing hidden piece 
of information. It is shown in Appendix B 3 that for the RI parameterization the choice of the ranges actually determines the 
posterior PDF for a. 



E. Conclusion 

The Bayesian PDFs result from a prior- and 
parameterization-dependent weighted average of data 
with mirror solutions. The fact that the posterior PDFs 
in the MA and RI parameterizations are so different com- 
pared to the PLD and ES parameterizations is due to a 
strong prior dependence. The behavior of the posterior 
PDF at the origin (and at 180°) also strongly depends 
on the parameterization (see Appendix B). 

From all these results, what is a scientist supposed 
to communicate as a value for the parameter a, with- 
out forgetting that they are all based on the same data? 
Bayesianism is simply a system for keeping one's internal 



beliefs self-consistent. It is not concerned with whether 
or not these beliefs represent the information content of 
the data. 



VII. REMOVING ESSENTIAL INFORMATION 

It is instructive to study the behavior of the Bayesian 
posteriors in a situation where one deliberately removes 
crucial data from the analysis, e.g., when performing the 
fit without using the B° — ► 7r°7r° branching fraction and 
direct CP-asymmetry measurements. In this case it is 
well known [7] that no information on a can be derived 
from the data (with the exception of the exclusion of a = 
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FIG. 4: Results for the CKM phase a obtained with the different parameterizations from the B — > nn isospin analysis, without 
using the input from the B° — > 7r°7r° branching fraction and direct CP-asymmetry measurements. No model-independent 
constraint on a can be inferred in this case (with the exception of the exclusion of the singular points a — 0,7r). The upper 
plot shows the frequentist confidence level, which is independent of the parameterization used. The remaining plots show the 
Bayesian a posteriori PDFs for the parameterizations indicated by the labels (the unsmooth aspect of some PDFs is due to the 
fact that we used "only" 10 (sic!) Monte Carlo events to carry out the numerical integration). 



in case of non-zero CP violation, see Appendix B 1). In 
effect, having carried out the fit for a given a value, the 
values of the model parameters that correspond to this fit 
can be used to compute explicitly the values of the model 
parameters corresponding to any other value of a, if non- 
zero (see Appendix B 1): this second set of values yields 
a fit of exactly the same quality (x 2 )- Stated differently, 
the posterior for a provided by the Bayesian treatment, 
if unbiased, must be uniform since data do not favor any 



value of a. The posterior PDF obtained for the four 
parameterizations are shown in Fig. 4. While the PLD 
parameterization yields the expected uniform PDF, the 
three others do not: they are able to extract information 
on a, which is introduced by the priors. 
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FIG. 5: 2D example (Eqs. (9) and (11)): the values of 
exp(— X 2 /2) as a function of the parameters (a,/j). 



VIII. MIRROR SOLUTIONS IN A SIMPLE 2D 
PROBLEM 

In this section we present a simple and solvable two- 
dimensional example to illustrate how mirror solutions 
make the Bayesian approach fail. We work within a the- 
ory that predicts the expressions of two observables X 
and Y as a function of the two parameters a and /i 

X = {a + i i)\ 

y = m 2 ■ (9) 

Assuming (X m ,Y m ) are measured values for the observ- 
ables, the central values for the parameters can be found 
by inverting the above system 

= ex \[X^ - e Y \/Ym , (10) 
Mo = e Y yYm , 

where ex, ey = ±1- Hence there are in general four 
solutions for (a^Mo) for a given set of measurements. 
Note that this example is far from being academic, since 
the theoretical expressions above are very similar to the 
usual amplitudes <-> branching fraction relations in par- 
ticle physics. The pattern of discrete ambiguities is also 
very similar to the one encountered in the isospin analysis 
for the CKM phase a. 

If a and fi are fundamental physics parameters, Nature 
can only accommodate a single pair of values. This means 
that the representation of the four- valued discrete ambi- 
guity (e x , e Y ) = (+1, +1) , (+1, -1) , (-1, +1) , (-1, -1) 
must be interpreted as a logical exclusive OR operator. 

We assume that an experiment has measured the ob- 
servables from a Gaussian sample of events, with the re- 
sults 

X = 1.00 ±0.07, 

Y = 1.10 ±0.07. (11) 
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FIG. 6: Top: result of the frequentist fit to the data. The 
peaks are exactly located at the analytical solutions a = 
-2.05, -0.05, 0.05, 2.05 computed from Eq. (10). Bottom: 
a posteriori PDF according to the Bayesian treatment. The 
most probable value is biased. 



While the measurement is reasonably precise (below the 
10% level), and the mirror solutions are well separated in 
the (a, /i) space (see Fig. 5), the central values correspond 
to a somewhat "unlucky" situation since, as shown in 
Fig. 6, the two solutions for small a overlap somewhat. 
This kind of overlapping precisely corresponds to what 
may occur in the isospin analysis of the B — > tttt data. 

The result of the Bayesian procedure applied to this 
2D example is shown in Fig. 6. One immediately sees 
the striking difference with respect to the frequentist fit: 
after the marginalization with respect to the nuisance 
parameter fi, only one peak instead of two is left close to 
the origin in the a constraint, and its best value is biased 
with respect to the minimum x 2 ones (which coincide 
with the explicit solution of Eq. (10)). 

This unexpected behavior is best understood by look- 
ing at Fig. 5. In this space the likelihood is Gaussian and 
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the four solutions do not overlap; while the minimum-^ 2 
fit selects, for each value of a, the best value of [i with re- 
spect to the data, the Bayesian procedure integrates [22] 
all events in the fi direction that correspond to the same 
value of a. In other words, the frequentist approach nat- 
urally implements the logical exclusive OR operating on 
the solutions, in contrast to the Bayesian approach that 
effectively replaces OR by AND. The latter is clearly un- 
acceptable if one is used to the common wisdom that 
fundamental parameters have a definite single value real- 
ized in Nature. In the present case, the counter-intuitive 
result of the Bayesian procedure is that the most proba- 
ble value of a is a = 0, which in turn implies X = Y that 
is not the situation preferred by the data (see Eq. (11)). 



IX. ORIGIN OF THE PROBLEM 

The origin of the problem lies in the very first Bayesian 
assumption, namely that unknown model parameters are 
to be understood as mathematical objects distributed ac- 
cording to PDFs, which are assumed to be known: the 
priors. Obviously, the choice of the priors cannot be irrel- 
evant; hence, the Bayesian treatment is doomed to lead to 
results which depend on the decisions made, necessarily 
on unscientific basis, by the authors of a given analysis, 
for the choice of these extraordinary PDFs. This is not 
a new situation. This limitation — deliberately mixing 
information coming from scientific data together with hu- 
man inputs — is frequently perceived as mild, because 
"reasonable variations" of the human inputs lead to "rea- 
sonable variations" of the output of the Bayesian treat- 
ment. Moreover, a particular shape of PDF is declared 
to be "reasonable": the uniform PDF. This is because, 
implicitly, one feels that using a flat distribution for pri- 
ors is akin to using no information at all, hence injecting 
only a weak a priori assumption through the priors. 

What "reasonable variations" means is no more no less 
clear as what "systematics uncertainties" [23] mean in 
general: the dependence on the priors can be viewed as 
a systematic uncertainty of human origin. What is par- 
ticular in the Bayesian treatment discussed in this article 
is that "reasonable variations" of the human inputs lead 
to clearly "unreasonable variations" of the output of the 
Bayesian treatments. For example, the posterior PDF 
for a (see Fig. 2) using the RI parameterization (with 
the ranges of Table I) leads one to conclude that the 
Standard Model is close to being defeated; knowing that 
the bulk of data (from other B decays) indicates a value 
of the phase a = (100±g 5 )° [24]. Worse, if one widens 
to infinity the ranges used for the RI parameterization, 
the Bayesian treatment fails to provide a proper PDF for 
a (see appendix B3). The key here is that the statis- 
tical analysis dealt with is performed in a multivariate 
parameter space. It is well known that the naive use of 
"uninformative" priors, especially in multivariate spaces, 
suffer from various paradoxes and is thus widely criti- 
cized in the literature [4, 25]. Our examples are even 



more striking because of strong non-linearities and ex- 
act degeneracy among several mirror solutions, and show 
that the analysis presented in Ref. [6] bears no physical 
meaning. 



X. CONCLUSION 

We have demonstrated in this paper that in the isospin 
analysis of B — > tttt decays used to constrain the CKM 
phase a, despite the rather precise experimental in- 
puts, the posterior PDF for a shows a striking depen- 
dence on shape and ranges used for the prior PDFs, 
and on the choice of the amplitude parameterization. 
The very frequent naive use of flat priors as presumed 
"non-informative" priors hides important unwarranted 
assumptions, which may easily invalidate the analysis. 
As stressed by David R. Cox at the Phystat05 confer- 
ence [26]: "It seems to be agreed from all theoretical 
standpoints that flat or ignorance priors are dangerous, 
although they are widely used [in Bayesian statistics]. 
Flat priors in several dimensions may produce clearly un- 
acceptable answers." Flat priors are informative. There 
is a big difference between no knowledge, and a distribu- 
tion that is uniform in some (arbitrary) parameterization. 
Contrary to what is often stated, Bayesian and frequen- 
tist statistics can lead to very different conclusions about 
the same data, even in the limit of a large data sam- 
ple. The choice of priors always matters in the Bayesian 
treatment. 

The proponents for the use of Bayesian statistics in 
High Energy Physics often argue that "everybody is 
Bayesian in everyday life". It is certain that Bayesian 
reasoning dominates the decision making mechanism of 
our everyday life. Scientific results however differ in many 
aspects from everyday-life reasoning. In particular, the 
translation of prior beliefs into a robust posterior number, 
representing universal knowledge is impossible. Only the 
exact knowledge of all prior assumptions (including the 
parameterization, PDF shapes and parameter ranges) 
used in a certain Bayesian analysis allows another one 
to reproduce its posterior result. Bayesian statistics has 
been shown to be useful for decision making, such as buy- 
ing and selling stock options, or filtering out unwanted 
electronic mails. The decision-making problem is not sci- 
entific in nature. 

Leonard J. Savage, a founder of modern subjective 
Bayesianism makes very clear throughout his work that 
the theory of personal probability "is a code of consis- 
tency for the person applying it, not a system of predic- 
tions about the world around him". "But is a personal 
code of consistency, requiring the quantification of per- 
sonal opinions, however vague or ill formed, an appropri- 
ate basis for scientific inference?" [5]. Isn't a hypothesis 
in physical science either true or false? What is hence the 
meaning of the prior in this case? The Bayesian personal 
probability theory misses the main point: "it confuses 
feeling with fact" [14]. In physical sciences, the theory 
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TABLE II: Ranges taken for the parameters of the various parameterizations used to fit the B — > pp observables. The phases 
are given in radians. 
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is intended to describe the external physical universe, 
rather than one's internal psychological state. 

Science has to summarize the available information the 
best it can. The Bayesian tools do not tell us what we 
want to know in science. What we seek are methods 
for generating and analyzing data and for using data to 
learn about experimental processes in a reliable manner. 
The kinds of tools needed to do this are crucially differ- 
ent from those the Bayesian statistics supplies [5] . These 
difficulties with the personalistic Bayesian approach in- 
dicate that there is a need for objective probabilities and 
a methodology for statistics based on testing. However 
the test statistics is chosen allows for an objective inter- 
pretation of the scientific results [27] . 
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posterior PDFs for a are shown in Fig. 7. Again the 
frequentist treatment (see top plot in Fig. 7) is indepen- 
dent of the parameterization used. The constraints on a 
are lumped together under the two plateaus in the 1-CL 
curve. 

In the PLD and ES parameterizations, the range of 
plausible values for the parameter a respect the symme- 
try of the problem. As explained in Appendix B 1, the 
experimental data are still consistent with no CP viola- 
tion, so the posterior PDF for a = 0[tt] does not vanish. 
However, in the Standard Model parameterization, the 
prior breaks the symmetry of the problem and leads to 
a disaster in the RI parameterization (see discussion in 
Appendix B3). In both cases, information contained in 
the data are convolved with additional information pro- 
vided by the priors: the posterior is dominated by the 
priors and not by the data. 



APPENDIX B: THE a -f LIMIT 

In the next two sections, we present two additional 
paramctrizations that describe finite CP violation with 
a = and finite values for the remaining parameters, 
and that are mathematically equivalent to the Standard 
Model parametrization except at a = 0. 



APPENDIX A: BAYESIAN TREATMENT 
APPLIED TO B -> pp DECAYS 

The (longitudinally polarized) B — > pp system is sim- 
ilar to B — > 7T7T except that only an upper bound on 
the branching fraction to p°p° is currently available. 
The current world average values for the observables are 
B+- = (25.1 ± 3.7) x 10~ 6 , B +Q = (19.1 ± 3.5) x 10~ 6 , 
B oa = (0.54 ± 0.41) x 10~ 6 , C+- = -0.02 ± 0.17, 
S + ~ = -0.22 ± 0.22 [28] (the longitudinal polarizations 
are /+" = 0.966 ± 0.025 and /+° = 0.96 ± 0.06). The 
ranges for the flat prior PDFs in the various parame- 
terizations are summarized in Table II. The resulting 



1. Mathematics 

The point a — [tt] is a particular point of the theory 
for physical reasons. In the Standard Model it corre- 
sponds to vanishing CP violation. The compatibility of 
the current B — > tttt data with this point is only marginal, 
because the CP asymmetries are found to be different 
from zero [21, 29]. 

As described in Sections III and IV, the behavior of 
the two statistical approaches around a — [tt] shows 
striking differences. In the B — > 7T7t case, the frequen- 
tist method provides no significant exclusion confidence 
level for values of a close to zero (the fact that Fig. 2 
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FIG. 7: Results for the CKM phase a obtained with the different parameterizations from the B — > pp isospin analysis. The 
upper plot shows the frequentist confidence level, which is independent of the parameterization used. The remaining plots show 
the Bayesian a posteriori PDFs for the parameterizations indicated by the labels. 



(upper plot) depicts a continuous 1 — CL line through 
the point a = is an artifact of the chosen binning of 
the a scan that does not hit this singular point). Setting 
a to zero in the Standard Model parameterization leads 
to a confidence level of the order of 10~ 8 , which reflects 
the presence of CP violation in the data. The frequentist 
confidence level as a function of a is thus discontinuous, 
as explained below and in Appendix B 2. 

On the other hand, whatever the values of the ob- 
servables, the PDF as a function of a is continuous at 
the origin in the Bayesian approach, and exhibits a clear 



drop in the MA parameterization. In the pp case, where 
the experimental data are still compatible with the ab- 
sence of CP violation, the a = [ir] value has a fairly 
good PDF/CL in both Bayesian and frequentist meth- 
ods (with however the usual strong prior dependence for 
the Bayesian result). 

The fact that vanishingly small, but non zero, values of 
the CP phase a can generate finite CP violation is easily 
seen as follows. Let us take the branching fraction 
and CP asymmetry C" 1 as an example: in the Standard 
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Model parameterization, they can be written as 



B 



= \T+- 



f 2 cos a R C (T+-P*) + |P| 2 , (Bl) 
2sinaIm(P/T+-) 



1 + 2cosaRe(P/TH 



\P/T* 



The limit a -> 0, |T+-|, |P| -► oo, and P/T+ - -► -1 
is in general indeterminate as far as the observables are 
concerned. In particular any finite value can be accom- 
modated in this limit. The complete inspection of the 
six observables shows that the combined limit 12 a — > 0, 
|T+-|, |T 00 |, |P| -> oo and P/T+- P/T Q0 -> -1 leads 
in general to finite branching fractions and CP asymme- 
tries. When a is very small but non zero, this peculiar 
limit is fully taken into account by the frequentist fit, 
which results in a finite confidence level. In the Bayesian 
method, however, the priors mechanically suppress the 
above parameter configuration. 

Finally we note that the limit a — > with finite ob- 
servables is obtained with finite values of the parameters 
a ff, A 1 ) a i & an d A in the PLD parameterization: this is 
due to the fact that a e ff can generate CP violation even 
with a being strictly set to zero. The Standard Model 
and the PLD parameterizations are equivalent except at 
the point a = 0, where the Jacobian corresponding to 
the change of variables is singular. 



2. Physics 

The parameterization in Eq. (1) naively holds only 
within the Standard Model. Let us assume arbitrary new 
physics contributions to the AI = 1/2 channel; the most 
general parameterization can be written as 



A+- = 



y/2A Q0 = e~ la T m 



V2A 



P + M N p 

P - M NP , 
(T 00 + T+-), 



(B3) 



where Mnp is a complex AI = 1/2 new physics ampli- 
tude. For the B° decay the amplitudes above are CP- 
transformed in the following way: a — > —a, Mnp — > 
Mnp- Thus in general Mnp ^ Mnp and the new physics 
contribution also violates CP. 

It is now clear that, if a ^ or it, Eq. (B3) can be 
recasted into the Standard Model form (see Eq. (1)) [30], 
with (T, P) -> (T, P) and 



T = T 



M 



NP 



M 



XP 



2i sin a 



12 Obviously, the divergences of some of the amplitudes is not an 
appealing physical result! However, in the present context of a 
set of six observables analyzed in the framework of SU(2) sym- 
metry, nothing prevents its occurrence. In practice, one may 
add to the analysis new observables which could preclude these 
divergences. 
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FIG. 8: Fit results of the isospin analysis to the B — > nn ob- 
servables. Black dots: Standard Model parameterization (see 
Eq. (1)); shaded curve: the same but allowing arbitrary new 
physics contributions to the A7 = 1/2 channel (see Eq. (B4)). 
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In other words the Standard Model parameterization of 
the isospin analysis is mathematically equivalent to a 
general Standard Model + new physics parameterization, 
provided that the new contributions are purely AI =1/2. 
This equivalence is exact and holds for any value of the 
parameters, except for a = [it]. 

This equivalence has an important physical conse- 
quence: it means that the isospin analysis for the phase 
a bears no information on the AI =1/2 channel. In 
particular it may happen that CP violation is observed 
in specific asymmetries such as S + ~ , C H , C 00 , whereas 
a remains compatible with zero or it: this configuration 
corresponds to a situation in which CP violation is gen- 
erated by non-standard contributions to the AI = 1/2 
channel. Since the Standard Model parameterization al- 
ready contains implicitly this possibility, there is no need 
to add a specific new physics term to the fit. 

Figure 8 shows the result of the frequentist fit 13 for 
both parameterizations from Eqs. (1) and (B3). As ex- 
pected from the above discussion, the two curves are 
strictly identical except at the a — [it] points where 
the confidence level is low in the Standard Model param- 
eterization due to the presence of CP violation in the 
data. 

In contrast to the frequentist approach, the Bayesian 
treatment violates the equivalence described in Eq. (B4), 
because choosing finite priors in one parameterization au- 
tomatically restricts the phase space in the other param- 
eterization. The Bayesian posterior PDF for a in the 



13 Note that this is a non trivial example where one has less ob- 
servables (six) than parameters in Eq. (B3) (ten). 



15 



parameterization in Eq. (B3) would yield a fairly good 
probability density at a ~ 0, since the amplitudes M^p 
and Mnp can generate CP violation. 

It may appear legitimate to perform a fit to the B — > 
7T7T data within the Standard Model, not allowing for new 
physics neither in the AI = 3/2 nor the AI = 1/2 tran- 
sitions. From the above discussion it is not completely 
possible, unless amplitudes are known exactly within the 
Standard Model. Still it is possible within the frequentist 
framework to restrict the parameter space to "reason- 
able" ranges. Regardless of the accuracy of these ranges, 
this would not be the isospin analysis anymore since the 
isospin symmetry only predicts Eq. (1) without telling us 
anything about the order of magnitude of the transition 
amplitudes. 




20 40 60 80 100 120 140 160 180 

a (deg) 



3. The RI parameterization 

Mathematically, the fact that the RI parameterization 
leads to cataclysm in the Bayesian treatment can be un- 
derstood as follows. For a given set of model parameters 
T H , P, T 00 in Eq. (1) one can always define the sec- 
ondary parameters , p, r 00 such that: 

= / sin a , 

P = p - t + ~ / tana , (B5) 
T oo = r oo_ T +- /sina 

The expressions of the three amplitudes in Eq. (1) then 
take the form 

A + ~ = -ir + ~ + p, 
V2A 0Q = iT+- +e- ta T 00 -p, (B6) 
V2A +0 = e~ ia r m , 

where for the CP-conjugate amplitudes one has to trans- 
form a — > —a, t h — > — . It is now clear that 
t h 7^ can generate finite CP violation even if a = 0. 
The Jacobian of the transformation is 

d\T+-\dRe(p)dIui(p)dRe(T 00 )dlm(T m ) 
d\T+-\dRe{P)dlm{P)dRc{T oa )dlm{T ao ) ~\ sma \ 

(B7) 

and thus, in the RI parameterization, the posterior for a 
can be written as: 

P m (a) <x —l—P mT (a) (B8) 
| sin a | 

with 

P K lM) « / e -ix 2 [k+-|,Rc(p)Jm(p)3c(r 00 ),Im(r 0( '),a] x 

d\r+- \dRe(p)dlm(p)dRe(T 00 )dlm(T m ) . 

The Prt t function is the posterior PDF for a in the "RI T " 
parameterization. It is shown in Fig. 9 with the B — > 7T7t 
observables. One observes that Pri t is regular and finite 



FIG. 9: Posterior PDF for a in the "r" parametrization (see 
Eq. (B6)) for B — -> tt-k decays. 

for a — and a — ir. As a result, it follows that in the 
RI parameterization the posterior Prj (a) is not only di- 
vergent at the origin (and for a — n) but is an improper 
PDF: it cannot be normalized to unity. Applying the 
same treatment to the MA parameterization exhibits no 
singularities as the one obtained in the RI parameteriza- 
tion: the Jacobian |g?RI t /g?MA| behaves like \t^ | 2 /a at 
the origin which ensures that the posterior PDF for a in 
the MA parameterization vanishes at a — 0° (a = 180°) 
in agreement with Fig. 2. 

As an illustration of this problem, the ranges of the 
priors for \T+~\, Re(P) and Re(T 00 ) in Eq. (1) have 
been widened by a factor two compared to the ones used 
in Section VI. The corresponding posterior PDFs are 
shown in Fig. 10. One observes that the posterior PDFs 
are truncated and that the posterior PDF for a has its 
two peaks higher and shifted towards the endpoints (0° 
and 180°) with respect to the ones of Fig. 2. It is worth 
noticing that the three ranges must be extended simul- 
taneously to detect this effect. 
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FIG. 10: Posterior PDFs in the RI parameterization as ob- 
tained when widening the ranges of the priors for |T +_ |, 
Re(P) and Re(T 00 ) by a factor two compared to the ones 
used in Section VI. One observes that the posterior PDFs for 
Re(P) and Re(T 00 ) do not vanish at the extremities of 
these extended ranges. In effect, the ranges of the prior must 
be extended to infinity, with the result that the posterior for 
a becomes an improper PDF, with singularities at a = 0° 
and a = 180°. 
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